This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.

plot(cars)

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

library("nycflights13")
library("tidyverse")
nycflights13::flights
View(flights)
?flights

int stands for integers.

dbl stands for doubles, or real numbers.

chr stands for character vectors, or strings.

dttm stands for date-times (a date + a time).

jan1 <- filter(flights, month == 1, day == 1)
filter(flights, month == 11 | month == 12)

If you want to determine if a value is missing, use is.na()

filter() only includes rows where the condition is TRUE; it excludes both FALSE and NA values. If you want to preserve missing values, ask for them explicitly
- to ask explicitly, use comment notation to do this in parenthesis after command

Exercises 5.2:

filter(flights, arr_delay >= 120) #10,200 flights
filter(flights, dest == "IAH" | dest == "HOU") #9,313 flights
airlines 
filter(flights, carrier == "DL" | carrier == "AA" | carrier == "UA") #139,504 flights
filter(flights, month >= 7, month <= 9) #86,326 flights
  #with between fcn could do as 
  filter(flights, between(month, 7, 9))
summary(flights$dep_time)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      1     907    1401    1349    1744    2400    8255 
filter(flights, dep_time <= 600 | dep_time == 2400) #8,255 flights
filter(flights, is.na(dep_time))

5.3 arrange() works similarly to filter() except that instead of selecting rows, it changes their order

arrange(flights, year, month, day)

Use desc() to re-order by a column in descending order and missing values are always sorted at the end

5.3 Exersices:

arrange(flights, is.na(desc(dep_delay)))
arrange(flights, desc(is.na(dep_time)), dep_time)
arrange(flights, desc(dep_delay)) #longest to shortest
arrange(flights, dep_delay) #shortest to longest
arrange(flights, desc(distance)) #4983 miles
arrange(flights, distance) #17 miles

5.4 select() allows you to rapidly zoom in on a useful subset using operations based on the names of the variables

b/c some data sets have hundreds or thousands of variables and not all are necessary for analysis–use select to just look at and work with the ones that are

starts_with(“abc”): matches names that begin with “abc”. ends_with(“xyz”): matches names that end with “xyz”. contains(“ijk”): matches names that contain “ijk”.

5.4 Exercises

select(flights, "dep_time", "dep_delay", "arr_time", "arr_delay")
select(flights, dep_time, dep_delay, arr_time, arr_delay)
select(flights, 4, 6, 7, 9) #this one uses column numbers of the variables
select(flights, starts_with("dep_"), starts_with("arr_"))

5.5 useful to add new columns that are functions of existing columns , for this we can use mutate()

flights_sml <- select(flights, 
  year:day, 
  ends_with("delay"), 
  distance, 
  air_time
)
mutate(flights_sml,
  gain = dep_delay - arr_delay,
  speed = distance / air_time * 60
)

5.5 Exercises

arr_time
Error: object 'arr_time' not found

5.6 summarise () groups key data into a single row

summary(flights)
      year          month             day           dep_time    sched_dep_time
 Min.   :2013   Min.   : 1.000   Min.   : 1.00   Min.   :   1   Min.   : 106  
 1st Qu.:2013   1st Qu.: 4.000   1st Qu.: 8.00   1st Qu.: 907   1st Qu.: 906  
 Median :2013   Median : 7.000   Median :16.00   Median :1401   Median :1359  
 Mean   :2013   Mean   : 6.549   Mean   :15.71   Mean   :1349   Mean   :1344  
 3rd Qu.:2013   3rd Qu.:10.000   3rd Qu.:23.00   3rd Qu.:1744   3rd Qu.:1729  
 Max.   :2013   Max.   :12.000   Max.   :31.00   Max.   :2400   Max.   :2359  
                                                 NA's   :8255                 
   dep_delay          arr_time    sched_arr_time   arr_delay       
 Min.   : -43.00   Min.   :   1   Min.   :   1   Min.   : -86.000  
 1st Qu.:  -5.00   1st Qu.:1104   1st Qu.:1124   1st Qu.: -17.000  
 Median :  -2.00   Median :1535   Median :1556   Median :  -5.000  
 Mean   :  12.64   Mean   :1502   Mean   :1536   Mean   :   6.895  
 3rd Qu.:  11.00   3rd Qu.:1940   3rd Qu.:1945   3rd Qu.:  14.000  
 Max.   :1301.00   Max.   :2400   Max.   :2359   Max.   :1272.000  
 NA's   :8255      NA's   :8713                  NA's   :9430      
   carrier              flight       tailnum             origin         
 Length:336776      Min.   :   1   Length:336776      Length:336776     
 Class :character   1st Qu.: 553   Class :character   Class :character  
 Mode  :character   Median :1496   Mode  :character   Mode  :character  
                    Mean   :1972                                        
                    3rd Qu.:3465                                        
                    Max.   :8500                                        
                                                                        
     dest              air_time        distance         hour      
 Length:336776      Min.   : 20.0   Min.   :  17   Min.   : 1.00  
 Class :character   1st Qu.: 82.0   1st Qu.: 502   1st Qu.: 9.00  
 Mode  :character   Median :129.0   Median : 872   Median :13.00  
                    Mean   :150.7   Mean   :1040   Mean   :13.18  
                    3rd Qu.:192.0   3rd Qu.:1389   3rd Qu.:17.00  
                    Max.   :695.0   Max.   :4983   Max.   :23.00  
                    NA's   :9430                                  
     minute        time_hour                  
 Min.   : 0.00   Min.   :2013-01-01 05:00:00  
 1st Qu.: 8.00   1st Qu.:2013-04-04 13:00:00  
 Median :29.00   Median :2013-07-03 10:00:00  
 Mean   :26.23   Mean   :2013-07-03 05:22:54  
 3rd Qu.:44.00   3rd Qu.:2013-10-01 07:00:00  
 Max.   :59.00   Max.   :2013-12-31 23:00:00  
                                              
LS0tCnRpdGxlOiAiUiBOb3RlYm9vayIKb3V0cHV0OiBodG1sX25vdGVib29rCmF1dGhvcjogTWFnZ2llIFNlaWRhCi0tLQoKVGhpcyBpcyBhbiBbUiBNYXJrZG93bl0oaHR0cDovL3JtYXJrZG93bi5yc3R1ZGlvLmNvbSkgTm90ZWJvb2suIFdoZW4geW91IGV4ZWN1dGUgY29kZSB3aXRoaW4gdGhlIG5vdGVib29rLCB0aGUgcmVzdWx0cyBhcHBlYXIgYmVuZWF0aCB0aGUgY29kZS4gCgpUcnkgZXhlY3V0aW5nIHRoaXMgY2h1bmsgYnkgY2xpY2tpbmcgdGhlICpSdW4qIGJ1dHRvbiB3aXRoaW4gdGhlIGNodW5rIG9yIGJ5IHBsYWNpbmcgeW91ciBjdXJzb3IgaW5zaWRlIGl0IGFuZCBwcmVzc2luZyAqQ21kK1NoaWZ0K0VudGVyKi4gCgpgYGB7cn0KcGxvdChjYXJzKQpgYGAKCkFkZCBhIG5ldyBjaHVuayBieSBjbGlja2luZyB0aGUgKkluc2VydCBDaHVuayogYnV0dG9uIG9uIHRoZSB0b29sYmFyIG9yIGJ5IHByZXNzaW5nICpDbWQrT3B0aW9uK0kqLgoKV2hlbiB5b3Ugc2F2ZSB0aGUgbm90ZWJvb2ssIGFuIEhUTUwgZmlsZSBjb250YWluaW5nIHRoZSBjb2RlIGFuZCBvdXRwdXQgd2lsbCBiZSBzYXZlZCBhbG9uZ3NpZGUgaXQgKGNsaWNrIHRoZSAqUHJldmlldyogYnV0dG9uIG9yIHByZXNzICpDbWQrU2hpZnQrSyogdG8gcHJldmlldyB0aGUgSFRNTCBmaWxlKS4gCgpUaGUgcHJldmlldyBzaG93cyB5b3UgYSByZW5kZXJlZCBIVE1MIGNvcHkgb2YgdGhlIGNvbnRlbnRzIG9mIHRoZSBlZGl0b3IuIENvbnNlcXVlbnRseSwgdW5saWtlICpLbml0KiwgKlByZXZpZXcqIGRvZXMgbm90IHJ1biBhbnkgUiBjb2RlIGNodW5rcy4gSW5zdGVhZCwgdGhlIG91dHB1dCBvZiB0aGUgY2h1bmsgd2hlbiBpdCB3YXMgbGFzdCBydW4gaW4gdGhlIGVkaXRvciBpcyBkaXNwbGF5ZWQuCgoKYGBge3J9CmxpYnJhcnkoIm55Y2ZsaWdodHMxMyIpCmxpYnJhcnkoInRpZHl2ZXJzZSIpCgpgYGAKCmBgYHtyfQpueWNmbGlnaHRzMTM6OmZsaWdodHMKVmlldyhmbGlnaHRzKQo/ZmxpZ2h0cwpgYGAKCiNpbnQgc3RhbmRzIGZvciBpbnRlZ2Vycy4KI2RibCBzdGFuZHMgZm9yIGRvdWJsZXMsIG9yIHJlYWwgbnVtYmVycy4KI2NociBzdGFuZHMgZm9yIGNoYXJhY3RlciB2ZWN0b3JzLCBvciBzdHJpbmdzLgojZHR0bSBzdGFuZHMgZm9yIGRhdGUtdGltZXMgKGEgZGF0ZSArIGEgdGltZSkuCgoKYGBge3J9CmphbjEgPC0gZmlsdGVyKGZsaWdodHMsIG1vbnRoID09IDEsIGRheSA9PSAxKQpgYGAKYGBge3J9CmZpbHRlcihmbGlnaHRzLCBtb250aCA9PSAxMSB8IG1vbnRoID09IDEyKQpgYGAKICAKCklmIHlvdSB3YW50IHRvIGRldGVybWluZSBpZiBhIHZhbHVlIGlzIG1pc3NpbmcsIHVzZSBpcy5uYSgpICAKCmZpbHRlcigpIG9ubHkgaW5jbHVkZXMgcm93cyB3aGVyZSB0aGUgY29uZGl0aW9uIGlzIFRSVUU7IGl0IGV4Y2x1ZGVzIGJvdGggRkFMU0UgYW5kIE5BIHZhbHVlcy4gSWYgeW91IHdhbnQgdG8gcHJlc2VydmUgbWlzc2luZyB2YWx1ZXMsIGFzayBmb3IgdGhlbSBleHBsaWNpdGx5ICAKICAgICAgICAtIHRvIGFzayBleHBsaWNpdGx5LCB1c2UgY29tbWVudCBub3RhdGlvbiB0byBkbyB0aGlzIGluIHBhcmVudGhlc2lzIGFmdGVyICAgICAgICAgICAgICBjb21tYW5kCgoKCkV4ZXJjaXNlcyA1LjI6CgpgYGB7cn0KZmlsdGVyKGZsaWdodHMsIGFycl9kZWxheSA+PSAxMjApICMxMCwyMDAgZmxpZ2h0cwoKZmlsdGVyKGZsaWdodHMsIGRlc3QgPT0gIklBSCIgfCBkZXN0ID09ICJIT1UiKSAjOSwzMTMgZmxpZ2h0cwoKYWlybGluZXMgCgoKZmlsdGVyKGZsaWdodHMsIGNhcnJpZXIgPT0gIkRMIiB8IGNhcnJpZXIgPT0gIkFBIiB8IGNhcnJpZXIgPT0gIlVBIikgIzEzOSw1MDQgZmxpZ2h0cwoKZmlsdGVyKGZsaWdodHMsIG1vbnRoID49IDcsIG1vbnRoIDw9IDkpICM4NiwzMjYgZmxpZ2h0cwogICN3aXRoIGJldHdlZW4gZmNuIGNvdWxkIGRvIGFzIAogIGZpbHRlcihmbGlnaHRzLCBiZXR3ZWVuKG1vbnRoLCA3LCA5KSkKCnN1bW1hcnkoZmxpZ2h0cyRkZXBfdGltZSkKZmlsdGVyKGZsaWdodHMsIGRlcF90aW1lIDw9IDYwMCB8IGRlcF90aW1lID09IDI0MDApICM5LDM3MyBmbGlnaHRzCgpmaWx0ZXIoZmxpZ2h0cywgaXMubmEoZGVwX3RpbWUpKSAjOCwyNTUgZmxpZ2h0cwoKCmBgYAoKNS4zCmFycmFuZ2UoKSB3b3JrcyBzaW1pbGFybHkgdG8gZmlsdGVyKCkgZXhjZXB0IHRoYXQgaW5zdGVhZCBvZiBzZWxlY3Rpbmcgcm93cywgaXQgY2hhbmdlcyB0aGVpciBvcmRlcgoKYGBge3J9CmFycmFuZ2UoZmxpZ2h0cywgeWVhciwgbW9udGgsIGRheSkKYGBgCgpVc2UgZGVzYygpIHRvIHJlLW9yZGVyIGJ5IGEgY29sdW1uIGluIGRlc2NlbmRpbmcgb3JkZXIgYW5kIG1pc3NpbmcgdmFsdWVzIGFyZSBhbHdheXMgc29ydGVkIGF0IHRoZSBlbmQKCjUuMyBFeGVyc2ljZXM6CgpgYGB7cn0KYXJyYW5nZShmbGlnaHRzLCBpcy5uYShkZXNjKGRlcF9kZWxheSkpKQphcnJhbmdlKGZsaWdodHMsIGRlc2MoaXMubmEoZGVwX3RpbWUpKSwgZGVwX3RpbWUpCgphcnJhbmdlKGZsaWdodHMsIGRlc2MoZGVwX2RlbGF5KSkgI2xvbmdlc3QgdG8gc2hvcnRlc3QKYXJyYW5nZShmbGlnaHRzLCBkZXBfZGVsYXkpICNzaG9ydGVzdCB0byBsb25nZXN0CgphcnJhbmdlKGZsaWdodHMsIGRlc2MoZGlzdGFuY2UpKSAjNDk4MyBtaWxlcwphcnJhbmdlKGZsaWdodHMsIGRpc3RhbmNlKSAjMTcgbWlsZXMKCgpgYGAKCjUuNApzZWxlY3QoKSBhbGxvd3MgeW91IHRvIHJhcGlkbHkgem9vbSBpbiBvbiBhIHVzZWZ1bCBzdWJzZXQgdXNpbmcgb3BlcmF0aW9ucyBiYXNlZCBvbiB0aGUgbmFtZXMgb2YgdGhlIHZhcmlhYmxlcyAKCmIvYyBzb21lIGRhdGEgc2V0cyBoYXZlIGh1bmRyZWRzIG9yIHRob3VzYW5kcyBvZiB2YXJpYWJsZXMgYW5kIG5vdCBhbGwgYXJlIG5lY2Vzc2FyeSBmb3IgYW5hbHlzaXMtLXVzZSBzZWxlY3QgdG8ganVzdCBsb29rIGF0IGFuZCB3b3JrIHdpdGggdGhlIG9uZXMgdGhhdCBhcmUKCnN0YXJ0c193aXRoKCJhYmMiKTogbWF0Y2hlcyBuYW1lcyB0aGF0IGJlZ2luIHdpdGgg4oCcYWJj4oCdLgplbmRzX3dpdGgoInh5eiIpOiBtYXRjaGVzIG5hbWVzIHRoYXQgZW5kIHdpdGgg4oCceHl64oCdLgpjb250YWlucygiaWprIik6IG1hdGNoZXMgbmFtZXMgdGhhdCBjb250YWluIOKAnGlqa+KAnS4KCjUuNCBFeGVyY2lzZXMKCmBgYHtyfQpzZWxlY3QoZmxpZ2h0cywgImRlcF90aW1lIiwgImRlcF9kZWxheSIsICJhcnJfdGltZSIsICJhcnJfZGVsYXkiKQpzZWxlY3QoZmxpZ2h0cywgZGVwX3RpbWUsIGRlcF9kZWxheSwgYXJyX3RpbWUsIGFycl9kZWxheSkKc2VsZWN0KGZsaWdodHMsIDQsIDYsIDcsIDkpICN0aGlzIG9uZSB1c2VzIGNvbHVtbiBudW1iZXJzIG9mIHRoZSB2YXJpYWJsZXMKc2VsZWN0KGZsaWdodHMsIHN0YXJ0c193aXRoKCJkZXBfIiksIHN0YXJ0c193aXRoKCJhcnJfIikpCgpzZWxlY3QoZmxpZ2h0cywgeWVhciwgbW9udGgsIGRheSwgeWVhciwgeWVhcikgI3RoZSBzZWxlY3QgZnVuY3Rpb24gZG9lcyBub3QgY291bnQgZHVwbGljYXRpb24gYW5kIGl0IG9ubHkgY291bnRzIHRoZSB2cmFpYWJsZSBvbmNlCgoKYGBgCjUuNSAKdXNlZnVsIHRvIGFkZCBuZXcgY29sdW1ucyB0aGF0IGFyZSBmdW5jdGlvbnMgb2YgZXhpc3RpbmcgY29sdW1ucwosIGZvciB0aGlzIHdlIGNhbiB1c2UgbXV0YXRlKCkKCmBgYHtyfQpmbGlnaHRzX3NtbCA8LSBzZWxlY3QoZmxpZ2h0cywgCiAgeWVhcjpkYXksIAogIGVuZHNfd2l0aCgiZGVsYXkiKSwgCiAgZGlzdGFuY2UsIAogIGFpcl90aW1lCikKbXV0YXRlKGZsaWdodHNfc21sLAogIGdhaW4gPSBkZXBfZGVsYXkgLSBhcnJfZGVsYXksCiAgc3BlZWQgPSBkaXN0YW5jZSAvIGFpcl90aW1lICogNjAKKQojSWYgeW91IG9ubHkgd2FudCB0byBrZWVwIHRoZSBuZXcgdmFyaWFibGVzLCB1c2UgdHJhbnNtdXRlKCkKCiN0aGUgZnVuY3Rpb24gbXVzdCBiZSB2ZWN0b3Jpc2VkOiBpdCBtdXN0IHRha2UgYSB2ZWN0b3Igb2YgdmFsdWVzIGFzIGlucHV0LCByZXR1cm4gYSB2ZWN0b3Igd2l0aCB0aGUgc2FtZSBudW1iZXIgb2YgdmFsdWVzIGFzIG91dHB1dApgYGAKCjUuNSBFeGVyY2lzZXMKYGBge3J9CmFpcl90aW1lICNJIGV4cGVjdCB0aGlzIHRvIGJlIHRoZSB0aW1lIG9mIHRoZSBmbGlnaHQgaW4gYWlyIAphcnJfdGltZSAtIGRlcF90aW1lCmBgYAoKNS42IApzdW1tYXJpc2UgKCkgZ3JvdXBzIGtleSBkYXRhIGludG8gYSBzaW5nbGUgcm93IApgYGB7cn0Kc3VtbWFyaXNlIChmbGlnaHRzKQpzdW1tYXJpc2UoZmxpZ2h0cywgZGVsYXkgPSBtZWFuKGRlcF9kZWxheSwgbmEucm0gPSBUUlVFKSkKc3VtbWFyeShmbGlnaHRzKSAjIEkgcHJlZmVyIHN1bW1hcnkgYmMgaXQgZ2l2ZXMgeW91IG1vcmUgZGF0YSBvcHRpb25zIHRvIGxvb2sgYXQgCmBgYAoK